Request metrics improvements #8420

enahum · 2024-12-17T12:01:22Z

Summary

More improvements on network metrics include speed, effective latency and more importantly some changes with the entry fetching endpoints.

Ticket Link

https://mattermost.atlassian.net/browse/MM-61878

Release Note

NONE

Update groupLabel to use a type definition Include effective Latency and Average Speed in the metrics Include parallel and sequential request counts

isacikgoz

Added my comments, wherever I have an idea :)

isacikgoz · 2024-12-19T21:13:25Z

app/client/rest/tracking.ts

+        }
+
+        const urlData = Object.entries(groupData.urls);
+        const sumLatency = urlData.reduce((result, url) => (result + (url[1].metrics?.latency ?? 0)), 0);


Should we discard 0 latency from equation, ideally it should never happen though right?

it should never happen really, the 0 here is forced to be type safe.

isacikgoz · 2024-12-19T21:17:54Z

app/client/rest/tracking.ts

+        const effectiveParallelLatency = Math.max(...parallelLatencies, 0);
+        const effectiveSequentialLatency = sequentialLatencies.reduce((sum, latency) => sum + latency, 0);
+
+        const effectiveLatency = effectiveParallelLatency + effectiveSequentialLatency;


Do these need to be summed or should we get the avg especially for sequential latency?

as far as I'm aware, for parallel requests the latency is the maximum latency of all requests and then sequential requests do add up to get the effective latency as if you execute two sequential requests and one takes 3s and the other one 5s then you waited a total of 8s.

that is the reason why I added the avg latency overall.

isacikgoz · 2024-12-19T21:18:24Z

app/client/rest/tracking.ts

+        const averageSpeedMbps = averageSpeedBps / 1_000_000; // Convert to Mbps
+
+        return {
+            averageSpeedMbps,


I guess we can use avg speed to decide whether download files or whatsoever.

isacikgoz · 2024-12-19T21:18:56Z

app/managers/performance_metrics_manager/index.ts

+        performance.mark('tti', options);
+    }
+
+    public measureTimeToInteraction() {


Yay, long anticipated metric is here :)

yes, but is nothing more than an output to the console at the moment.. I need to figure out where and how to include it.

Should I create another metric on the server to collect this?

@rahimrahman can it be optional? I think that'll be very nice to have but haven't checked into how to include it as is a totally different metric collected in a different way and somewhere else in the code

) * feat(MM-61865): send network info telemetry data * unit testing * fix latency vs effectiveLatency * cleanup test * fix failing tests * fix spelling error * fix failing test * more cleanup

rahimrahman · 2024-12-24T18:54:02Z

app/client/rest/tracking.ts

+    categorizeRequests(groupLabel: RequestGroupLabel): CategorizedRequests {
+        const parallel: ClientResponseMetrics[] = [];
+        const sequential: ClientResponseMetrics[] = [];
+        const groupData = this.requestGroups.get(groupLabel);
+        if (!groupData) {
+            return {parallel, sequential};
+        }
+
+        // Sort requests by start time
+        const requestsMetrics = Object.entries(groupData.urls).map((e) => e[1].metrics).filter((m) => m != null);
+        requestsMetrics.sort((a, b) => a.startTime - b.startTime);
+
+        let lastEndTime = 0;
+
+        for (const metrics of requestsMetrics) {
+            if (metrics.startTime < lastEndTime) {
+                // Overlapping request -> Parallel
+                parallel.push(metrics);
+            } else {
+                // Non-overlapping request -> Sequential
+                sequential.push(metrics);
+            }
+
+            // Update the last end time
+            lastEndTime = Math.max(lastEndTime, metrics.endTime);
+        }
+
+        return {parallel, sequential};
+    }
+
+    calculateAverageSpeedWithCategories = (
+        categorizedRequests: CategorizedRequests,
+        elapsedTimeInSeconds: number, // Observed total elapsed time in seconds
+    ): { averageSpeedMbps: number; effectiveLatency: number } => {
+        const {parallel, sequential} = categorizedRequests;
+
+        // Step 1: Calculate total data size in bits
+        const totalDataBits = [...parallel, ...sequential].reduce((sum, req) => sum + (req.compressedSize * 8), 0);
+
+        // Step 2: Calculate effective latency
+        const parallelLatencies = parallel.map((req) => req.latency);
+        const sequentialLatencies = sequential.map((req) => req.latency);
+
+        const effectiveParallelLatency = Math.max(...parallelLatencies, 0);
+        const effectiveSequentialLatency = sequentialLatencies.reduce((sum, latency) => sum + latency, 0);
+
+        const effectiveLatency = effectiveParallelLatency + effectiveSequentialLatency;
+
+        // Step 3: Calculate data transfer time
+        const dataTransferTime = elapsedTimeInSeconds - (effectiveLatency / 1000);
+
+        // Handle edge case: if data transfer time is zero or negative
+        if (dataTransferTime <= 0) {
+            return {averageSpeedMbps: 0, effectiveLatency: 0};
+        }
+
+        // Step 4: Calculate average speed
+        const averageSpeedBps = totalDataBits / dataTransferTime; // Speed in bps
+        const averageSpeedMbps = averageSpeedBps / 1_000_000; // Convert to Mbps
+
+        return {
+            averageSpeedMbps,
+            effectiveLatency,
+        };
+    };


I've made a significant proposed improvement based on data observation. I will try to explain as much as possible in this PR: #8439.

* fix: improve parallel requests calculation * fix test issue * multiple parallelGroups testing * calculateAverageSpeedWithCategories test * categorizedRequests in it's own describe * clean up duplicate tests * check for when groupLabel is non-existent * a case when groupLabel is non-existent * more testing with effective latency.

rahimrahman

I've spent quite a bit of time updating the tracking.ts in order to send telemetry data to the server. At this point, I think it is safe to say that it's been "tested" based on the changes and work that I've done in my 2 PRs.

larkox

Looks great. A few comments here and there, but nothing big.

larkox · 2025-01-08T10:01:42Z

app/actions/local/channel.ts

@@ -237,32 +239,51 @@ export async function resetMessageCount(serverUrl: string, channelId: string) {
    }
 }

-export async function storeMyChannelsForTeam(serverUrl: string, teamId: string, channels: Channel[], memberships: ChannelMembership[], prepareRecordsOnly = false, isCRTEnabled?: boolean) {
+const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {


Suggested change

const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {

const resolveAndFlattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {

larkox · 2025-01-08T10:11:28Z

app/actions/local/channel.ts

-        if (prepareRecordsOnly) {
-            return {models: flattenedModels};
-        }
+        return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);


I always struggle a bit with how JavaScript handles this.

At this point, we are returning the promise, and therefore, exiting the try/catch block. So... if a exception occur inside the promise, the exception will not be handled here.

So... in this particular case, the catch will only catch the first promise resolution of prepareAllMyChannels, but not the resolution of resolveAndFlattenModels or the resolution of prepareChannels inside prepareAllMyChannels.

I think the linter understands you are inside a try/catch and allows you to do this:

Suggested change

return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);

return await resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);

Sure, but would it get caught by the caller? In this case /remote/channel.ts has a try-catch block there too.

I've confirmed that the await is not needed due to try..catch on the caller catching this.

But should the caller be catching it? Shouldn't be this function the one catching it?

The point being is, there's try..catch on the callers. So errors within the promise will get bubbled up and caught by the callers.

By looking at the function, there is nothing that shows that it is going to throw. On the contrary, having the whole body of the function inside a try catch block should be an indication that you don't expect this function to throw.

If the callers of this function have try catch, great, it won't fail. But that doesn't mean this is not wrong. Not only that, we will lose part of the context of what is failing, since the catch will be higher up, instead of here, where it will clearly log that the the problem is specifically in storeAllMyChannels.

I made a new commit to add a try..catch in resolveAndFlattenModels. So if any of the models fail for whatever reason, we'll get a more isolated error output.

larkox · 2025-01-08T10:12:00Z

app/actions/local/channel.ts


-        return {models: flattenedModels};
+        return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);


Same comment about returning promises

Suggested change

return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);

return await resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);

larkox · 2025-01-08T10:20:17Z

app/actions/remote/channel.ts

+        for (const teamId of teamIdsSet) {
+            categoriesPromises.push(client.getCategories('me', teamId, groupLabel));
+        }


If we have an absurd amount of teams, we are flooding the network layer with get categories calls. It is not common, but we have already had reports of people having 100+ teams.

Unless we create a priority queue down at the network layer, we may want to consider batching this, so other processes can access the network layer before this finishes.

Agreed. Thanks for your observation. This is something this PR/changes is trying to establish on where the bottlenecks are and learn from it. And prioritize all the issues. We have a long way to go, but first, data.

larkox · 2025-01-08T10:21:16Z

app/actions/remote/channel.ts

+            for (let i = 1; i <= remaining; i++) {
+                remainingMembershipsPromises.push(client.getAllMyChannelMembersFromAllTeams(i, General.CHANNEL_MEMBERS_CHUNK_SIZE, groupLabel));
+            }


This for loop adds to the problem in my previous comment.

larkox · 2025-01-08T10:48:40Z

app/actions/websocket/index.ts

-    const tabletDevice = isTablet();
-    const isActiveServer = (await getActiveServerUrl()) === serverUrl;
-    if (isActiveServer && tabletDevice && initialChannelId === currentChannelId) {
-        await markChannelAsRead(serverUrl, initialChannelId, false, groupLabel);
-        markChannelAsViewed(serverUrl, initialChannelId);
-    }


Probably fine, but I wonder why we are removing this.

rahimrahman · 2025-01-09T21:18:05Z

@yasserfaraazkhan I know you have so much on your plate already. This is another PR that I would love to get through for 2.25.0.

In config.json, please enable telemetry CollectNetworkMetrics to true.

You should be able to see debug output once you've run the app with that setting enabled. Or use CHarles Proxy to see mobile telemetry sending data to the server, whether it accepts it or not.

If you can spin up a server from my changes here: mattermost/mattermost#29601, you should be able to kill 2 birds with 1 stone. The request will show up in log, and if you connect prometheus server to it, you should be able to see the output.

rahimrahman · 2025-01-10T15:50:57Z

/update-branch

yasserfaraazkhan

verified using /cloud instance and enabling diagnostics.
checked channel switching, opening threads flow. Working as expected

mattermost-build · 2025-01-16T20:55:32Z

Cherry pick is scheduled.

mattermost-build · 2025-01-16T20:55:35Z

Error trying doing the automated Cherry picking. Please do this manually

+++ Updating remotes...
Fetching upstream
hostfile_replace_entries: mkstemp: Read-only file system
update_known_hosts: hostfile_replace_entries failed for /app/.ssh/known_hosts: Read-only file system
From github.com:mattermost/mattermost-mobile
 * [new branch]          fixTimezoneCrash -> upstream/fixTimezoneCrash
   2f3dfbbbf..d1d3fda83  main             -> upstream/main
   790219b9f..8f0328e14  mobile-draft-e2e -> upstream/mobile-draft-e2e
   22ed82051..1b5ec1e00  release-2.25     -> upstream/release-2.25
Fetching upstream
hostfile_replace_entries: mkstemp: Read-only file system
update_known_hosts: hostfile_replace_entries failed for /app/.ssh/known_hosts: Read-only file system
+++ Updating remotes done...
+++ Creating local branch automated-cherry-pick-of-mattermost-mobile-#8420-upstream-release-2.25-1737060933
Switched to a new branch 'automated-cherry-pick-of-mattermost-mobile-#8420-upstream-release-2.25-1737060933'
Branch 'automated-cherry-pick-of-mattermost-mobile-#8420-upstream-release-2.25-1737060933' set up to track remote branch 'release-2.25' from 'upstream'.

+++ About to attempt cherry pick of PR #8420 with merge commit d1d3fda83a6e5f561f4790ae3f375142d02f92c9.

CONFLICT (modify/delete): app/actions/websocket/posts.test.ts deleted in HEAD and modified in d1d3fda83 (Request metrics improvements (#8420)).  Version d1d3fda83 (Request metrics improvements (#8420)) of app/actions/websocket/posts.test.ts left in tree.
error: could not apply d1d3fda83... Request metrics improvements (#8420)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
!!! git cherry-pick failed

+++ Aborting in-progress git cherry-pick.

+++ Returning you to the master branch and cleaning up.

* Network metrics * update network-client ref * fix unit tests * missing catch error parsing * Replace API's to fetch all teams and channels for a user Update groupLabel to use a type definition Include effective Latency and Average Speed in the metrics Include parallel and sequential request counts * update network library and fix tests * feat(MM-61865): send network request telemetry data to the server (#8436) * feat(MM-61865): send network info telemetry data * unit testing * fix latency vs effectiveLatency * cleanup test * fix failing tests * fix spelling error * fix failing test * more cleanup * fix: improve parallel requests calculation (#8439) * fix: improve parallel requests calculation * fix test issue * multiple parallelGroups testing * calculateAverageSpeedWithCategories test * categorizedRequests in it's own describe * clean up duplicate tests * check for when groupLabel is non-existent * a case when groupLabel is non-existent * more testing with effective latency. * resolveAndFlattenModels with a capital F * add try..catch within resolveAndFlattenModels * update groupLabel to fix failing lint * fix linting issue due to unknown group label * forgot to push changes * fix the indentation problem again * add env var option for COLLECT_NETWORK_METRICS --------- Co-authored-by: Rahim Rahman <[email protected]> Co-authored-by: Mattermost Build <[email protected]>

enahum added 5 commits December 17, 2024 19:56

Network metrics

41b32a6

update network-client ref

74d3b8c

fix unit tests

112578c

missing catch error parsing

0951b33

Replace API's to fetch all teams and channels for a user

dbdac89

Update groupLabel to use a type definition Include effective Latency and Average Speed in the metrics Include parallel and sequential request counts

enahum requested review from isacikgoz and rahimrahman December 17, 2024 12:01

mm-cloud-bot added the release-note-none label Dec 17, 2024

update network library and fix tests

a5d8bb2

isacikgoz reviewed Dec 19, 2024

View reviewed changes

rahimrahman added 2 commits December 23, 2024 13:23

Merge branch 'main' into request-metrics-improvements

9f65e90

feat(MM-61865): send network request telemetry data to the server (#8436

02e41da

) * feat(MM-61865): send network info telemetry data * unit testing * fix latency vs effectiveLatency * cleanup test * fix failing tests * fix spelling error * fix failing test * more cleanup

rahimrahman reviewed Dec 24, 2024

View reviewed changes

rahimrahman approved these changes Jan 7, 2025

View reviewed changes

rahimrahman requested a review from larkox January 7, 2025 22:55

larkox requested changes Jan 8, 2025

View reviewed changes

resolveAndFlattenModels with a capital F

1ddf677

rahimrahman added Build Apps for PR Build the mobile app for iOS and Android to test 3: QA Review Requires review by a QA tester labels Jan 9, 2025

rahimrahman requested a review from yasserfaraazkhan January 9, 2025 21:08

rahimrahman modified the milestones: v2.25.0, v2.24.0 Jan 9, 2025

mattermost-build and others added 2 commits January 10, 2025 17:51

Merge branch 'main' into request-metrics-improvements

b448540

add try..catch within resolveAndFlattenModels

641f0ca

update groupLabel to fix failing lint

d1b73a2

larkox approved these changes Jan 10, 2025

View reviewed changes

fix linting issue due to unknown group label

f8c0c76

rahimrahman added Build Apps for PR Build the mobile app for iOS and Android to test and removed Build Apps for PR Build the mobile app for iOS and Android to test labels Jan 12, 2025

amyblais added CherryPick/Approved Meant for the quality or patch release tracked in the milestone and removed CherryPick/Approved Meant for the quality or patch release tracked in the milestone labels Jan 13, 2025

yasserfaraazkhan approved these changes Jan 15, 2025

View reviewed changes

amyblais added 4: Reviews Complete All reviewers have approved the pull request QA Review Done CherryPick/Approved Meant for the quality or patch release tracked in the milestone and removed 3: QA Review Requires review by a QA tester labels Jan 16, 2025

rahimrahman added 2 commits January 16, 2025 09:46

Merge branch 'main' into request-metric-improvements-merge-2

0b27d45

forgot to push changes

57a432c

rahimrahman force-pushed the request-metrics-improvements branch from 5f37436 to 57a432c Compare January 16, 2025 17:17

rahimrahman added Build Apps for PR Build the mobile app for iOS and Android to test and removed Build Apps for PR Build the mobile app for iOS and Android to test labels Jan 16, 2025

rahimrahman added 2 commits January 16, 2025 11:49

fix the indentation problem again

f8aa8e8

add env var option for COLLECT_NETWORK_METRICS

6817796

rahimrahman merged commit d1d3fda into main Jan 16, 2025
6 checks passed

rahimrahman deleted the request-metrics-improvements branch January 16, 2025 20:55

rahimrahman mentioned this pull request Jan 17, 2025

cherry pick of #8420: Request metrics improvements (#8420) #8487

Merged

5 tasks

saturninoabril pushed a commit that referenced this pull request Jan 17, 2025

Request metrics improvements (#8420) (#8487)

5615405

amyblais added CherryPick/Done Successfully cherry-picked to the quality or patch release tracked in the milestone and removed CherryPick/Approved Meant for the quality or patch release tracked in the milestone labels Jan 17, 2025

rahimrahman mentioned this pull request Jan 21, 2025

Update environment variable COLLECT_NETWORK_METRICS #8505

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request metrics improvements #8420

Request metrics improvements #8420

enahum commented Dec 17, 2024

isacikgoz left a comment

isacikgoz Dec 19, 2024

enahum Dec 19, 2024

isacikgoz Dec 19, 2024

enahum Dec 19, 2024

isacikgoz Dec 19, 2024

isacikgoz Dec 19, 2024

enahum Dec 19, 2024

rahimrahman Dec 20, 2024

enahum Dec 20, 2024

rahimrahman Dec 24, 2024

rahimrahman left a comment

larkox left a comment

larkox Jan 8, 2025

larkox Jan 8, 2025

rahimrahman Jan 9, 2025

rahimrahman Jan 9, 2025

larkox Jan 10, 2025

rahimrahman Jan 10, 2025

larkox Jan 10, 2025

rahimrahman Jan 10, 2025

larkox Jan 8, 2025

larkox Jan 8, 2025

rahimrahman Jan 9, 2025

larkox Jan 8, 2025

larkox Jan 8, 2025

rahimrahman commented Jan 9, 2025

rahimrahman commented Jan 10, 2025

yasserfaraazkhan left a comment

mattermost-build commented Jan 16, 2025

mattermost-build commented Jan 16, 2025

	const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {
	const resolveAndFlattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => {

	return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);
	return await resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly);


		return {models: flattenedModels};
		return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly);

Request metrics improvements #8420

Request metrics improvements #8420

Conversation

enahum commented Dec 17, 2024

Summary

Ticket Link

Release Note

isacikgoz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahimrahman left a comment

Choose a reason for hiding this comment

larkox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahimrahman commented Jan 9, 2025

rahimrahman commented Jan 10, 2025

yasserfaraazkhan left a comment

Choose a reason for hiding this comment

mattermost-build commented Jan 16, 2025

mattermost-build commented Jan 16, 2025